Skip to content

feat(frost): finalize interactive signing over the first t responsive committers (7.3 t-of-included PR2/3)#4093

Merged
mswilkison merged 2 commits into
feat/frost-schnorr-migration-scaffoldfrom
feat/frost-7.3-runner-subset
Jun 19, 2026
Merged

feat(frost): finalize interactive signing over the first t responsive committers (7.3 t-of-included PR2/3)#4093
mswilkison merged 2 commits into
feat/frost-schnorr-migration-scaffoldfrom
feat/frost-7.3-runner-subset

Conversation

@mswilkison

Copy link
Copy Markdown
Contributor

What

RFC-21 Phase 7.3, the runner subset of t-of-included finalize. Today the interactive ROAST signing runner (pkg/frost/signing/roast_runner_frost_native.go) waits for all included members in round 1 (collectCommitments) and round 2 (collectShares), so one slow or offline member stalls the attempt to its ctx deadline → fail → ROAST retry. This makes an attempt finalize over the first t responsive committers.

Roles (decided by the package's signed signer_ids, PR1's field #4092)

  • CoordinatorcollectCommitments collects until exactly t (its own commitment seeded), proceeds the instant t have arrived, and builds the FROST package over that t-subset. It sets signer_ids (ascending, distinct) in signSigningPackage to the chosen members. If fewer than t ever commit, the ctx deadline fires and the run fails into the existing retry path.
  • Non-coordinator signer (in signer_ids) — runs round 2 and collects round-2 shares from the package's signer set, not the full included set.
  • Observer (committed in round 1 but not in signer_ids) — does not run round 2. It aggregates the subset's broadcast shares publicly, MarkSucceeded, and returns the signature — and still aborts the engine session on success to drop its unconsumed round-1 nonces. The success path previously suppressed the abort, which leaked an observer's nonces; fixed via a signedRound2 flag.

A round-2-silent member among the chosen t still fails the attempt → retry (existing path). Only the elected coordinator builds the package, so there is no honest divergence (receivers already filter sender==elected + attempt hash). The FROST SigningPackageBytes remains the cryptographic source of truth, so a coordinator that lies in signer_ids causes only a liveness failure (aggregate fails closed), never a wrong signature or false blame.

Inert until oversizing (MacLane's policy knob)

t-of-included is dormant until participant selection oversizes the included set past the threshold — it currently trims to exactly honestThreshold (signing_loop_legacy_selector.go:91), so the chosen subset equals the full included set and every member signs (today's behavior). The machinery is harmless and inert until then.

Hardening

The runner constructor now rejects threshold > len(includedSet) — the new round-1 loop's termination invariant (a well-formed attempt always selects ≥ threshold members; fail fast rather than silently degrade into timeout-driven retries). Found by the inline adversarial review.

Tests (all frost_native, behind pre-prod tags)

  • Extended buildInteractiveSigningHarness with runMembers to drive silence (a non-responsive member).
  • FinalizesOverResponsiveThresholdSubset — a non-coordinator offline member is excluded; the two responsive members finalize over t; the coordinator's package omits the offline member.
  • OversizedAllOnline_FinalizesOverThreshold — all online; exactly t sign (round-2 count), n−t observe and abort their nonces, all reach Succeeded; the broadcast package carries exactly t ascending signer ids.
  • ObserverAggregatesAndAbortsWithoutSigning — focused observer: aggregates, Succeeded, 0 round-2 calls, 1 abort.
  • Existing UsesEngineDerivedFrostIdentifiers retargeted to a full-included (n==t) attempt for determinism.

Validated: build+vet+test across the 5 tag combos (default / frost_native / frost_roast_retry / frost_native frost_roast_retry / frost_native frost_tbtc_signer w/ CGO), -race on pkg/frost/signing, the frost_roast_retry drive tests, repo-wide frost_native vet, pkg/tbtc build, and gofmt — all clean.

Follow-up (PR3, lands with/right after this)

signingDoneCheck (pkg/tbtc/signing_done.go:93, called from signing_loop.go:458) sets expectedSignersCount = len(attemptMembersIndexes); once oversizing is enabled, a successful attempt that omitted an offline member would hang the outer done-check. PR3 makes it threshold/package-set aware.

🤖 Generated with Claude Code

… committers (7.3 t-of-included PR2/3)

The interactive ROAST signing runner waited for ALL included members in
round 1 and round 2, so one slow or offline member stalled the attempt to
its ctx deadline -> fail -> ROAST retry. Make the attempt finalize over the
first t responsive committers instead:

- Coordinator: collectCommitments now collects until EXACTLY t (the
  threshold) and stops, builds the FROST package over that t-subset, and
  carries the chosen members in the package's signed signer_ids (PR1's
  field), ascending+distinct.
- Non-coordinator signer (in signer_ids): runs round 2 and collects round-2
  shares from the package's signer set, not the full included set.
- Observer (committed in round 1 but not in signer_ids): does NOT run
  round 2; it aggregates the subset's broadcast shares publicly, marks the
  attempt succeeded, and returns the signature - and still aborts the engine
  session on success to drop its unconsumed round-1 nonces (the success path
  previously suppressed the abort, which leaked an observer's nonces).

t-of-included stays inert until participant selection oversizes the included
set past the threshold (it currently trims to exactly honestThreshold), so
the machinery is harmless and dormant until MacLane enables oversizing. PR3
will make signingDoneCheck (pkg/tbtc) threshold/package-set aware.

Also harden the runner constructor to reject threshold > included-set size,
the new round-1 collection loop's termination invariant.

All behind the frost_native pre-prod build tags.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ef20a982-cdcd-466c-ac94-e3436172a143

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/frost-7.3-runner-subset

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…tion evidence

Fold of Codex review P2 on #4093. recordShareMessage filtered by the signer
set before RecordShareSubmission, so an included member that is NOT in the
chosen signing subset (an observer) had its share dropped before the
collector could retain it. In a targeted coordinator-equivocation case where
that member was handed a different package and broadcasts a divergent share,
the collector is designed to keep it as EquivocationKindDivergentShare
evidence for the f+1 blame/transition path - and the signer-set filter was
discarding it.

Split the membership gate: retain (hand to RecordShareSubmission) any
INCLUDED member's well-formed share, but only COUNT signer-set shares toward
the aggregate. The collector already enforces included-membership and keeps
divergent shares separately; the runner now caches the included set
(includedMembers) and gates retention by it, restoring the pre-subset
evidence surface while keeping t-of-included counting.

Added TestInteractiveSigningRunner_RetainsObserverDivergentShareAsEvidence:
an included observer's divergent share is retained (EquivocationKindDivergentShare
emitted) but not counted.

Gemini approved the PR; this addresses the sole Codex finding.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mswilkison mswilkison merged commit 9a2d113 into feat/frost-schnorr-migration-scaffold Jun 19, 2026
15 of 16 checks passed
@mswilkison mswilkison deleted the feat/frost-7.3-runner-subset branch June 19, 2026 21:28
mswilkison added a commit that referenced this pull request Jun 19, 2026
…f-included PR3/3) (#4094)

## What

Completes RFC-21 Phase 7.3 t-of-included finalize. With PR2 (#4093) an
interactive signing attempt finalizes over the first `t` responsive
committers of an (optionally) oversized included set. The non-subset and
offline members never broadcast a signing done check, so the outer
`signingDoneCheck` — which required a confirmation from **every**
attempt member — would hang an otherwise-successful attempt to its
timeout and force a needless retry.

This concludes the done check on a **deterministic threshold quorum**,
keeping the legacy path byte-for-byte unchanged.

## Design (locked via a Codex + Gemini consult)

My first cut completed on "the first `t` arrivals," which is
**network-order-dependent**: the resulting `latestEndBlock` (a per-node
max over a network-order-dependent subset) differs across honest nodes,
and that value feeds batch scheduling (`signBatch`: `signingStartBlock =
prev endBlock + interlude`) → the batch desyncs. The consult locked this
design:

- **Non-oversized (`included == honestThreshold`; today's selector
output and the whole coarse path)**: the legacy all-members rule is
**UNCHANGED** — wait for every attempt member, require all signatures
equal, return `max(endBlock)`.
- **Oversized (`included > honestThreshold`)**: bucket done checks **by
signature**, conclude once a bucket holds `>= honestThreshold` distinct
senders (the minimum that proves a valid threshold signature).
**Minority buckets (divergent / adversarial signatures) are ignored,
never fatal** — one bad done message can't fracture the group. The end
block is the **deterministic `attemptTimeoutBlock`** (every honest node
computes it identically), not a max over done messages.

With honest majority (`t > groupSize/2`) at most one signature bucket
can reach `t` — even under coordinator equivocation (two disjoint
`t`-subsets can't coexist when `2t > n`) — so the quorum is unique. That
means **no body-hash / proto change** is needed (bucket by the existing
signature field), and **no `signBatch` change** (the oversized path
feeds `attemptTimeoutBlock` through the existing return). The
`>1`-quorum branch is unreachable and intentionally non-fatal (no noisy
post-success failure).

## Inert until oversizing

`included == honestThreshold` today, so the legacy branch is taken and
behavior is identical to before; the quorum path only activates once
selection oversizes the set (MacLane's policy knob).

## Tests (`signing_done_test.go`)

- Legacy `TestSigningDoneCheck{,_MissingConfirmation,_AnotherSignature}`
— unchanged, still green (they use an attempt-member set of size
`honestThreshold`).
- `TestSigningDoneCheck_ThresholdSubsetConcludes` — oversized; reporters
carry different end blocks but the result is the deterministic
`attemptTimeoutBlock`.
- `TestSigningDoneCheck_OversizedIgnoresMinorityDivergentSignature` —
`t` correct + 1 divergent → concludes on the quorum, ignores the
minority (no error).
- `TestSigningDoneCheck_OversizedSplitBelowQuorumTimesOut` — 2+2 split,
no bucket reaches `t` → times out.

Validated: build+vet across default / `frost_roast_retry` /
`frost_native frost_roast_retry` / CGO; the done-check + signing-loop +
roast-transition tests with `-race`; gofmt clean. The
`signingDoneCheckStrategy` interface + `mockSigningDoneCheck` are
untouched (only the constructor signature changed).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant